A Comparison of Compiler Tiling Algorithms
نویسندگان
چکیده
Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding connict misses in low associativity caches. We propose a new technique based on intra-variable padding and compare its performance with existing techniques. Results show padding improves performance of matrix multiply by over 100% in some cases over a range of matrix sizes. Comparing the eecacy of diierent tiling algorithms, we discover rectangular tiles are slightly more eecient than square tiles. Overall, tiling improves performance from 0-250%. Copying tiles at run time proves to be quite eeective.
منابع مشابه
Communication-Minimal Tiling of Uniform Dependence Loops
Tiling is a loop transformation that a compiler uses to create automatically blocked algorithms in order to improve the benefits of the memory hierarchy and reduce the communication overhead between processors. Motivated by existing results, this paper presents a conceptually simple approach to finding tilings with a minimal amount of communication between tiles. The development of almost all r...
متن کاملImperfectly - Nested Loops Yonghong
This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respect...
متن کاملCode Tiling for Improving the Cache Performance of PDE Solvers
For SOR-like PDE solvers, loop tiling either helps little in improving data locality or hurts their performance. This paper presents a novel compiler technique called code tiling for generating fast tiled codes for these solvers on uniprocessors with a memory hierarchy. Code tiling combines loop tiling with a new array layout transformation called data tiling in such a way that a significant am...
متن کاملUsing a Dynamic Schedule to Increase the Performance of Tiling in Stencil Computations
A stencil computation determines the values of points in a grid of some dimensionality by repeatedly evaluating a given function of a grid point and its neighbors. The parallelization and optimization of stencil computations are subject of ongoing research. The most prevalent approach is the subdivision of the iteration domain into smaller pieces, called tiles. We give an overview of a method t...
متن کاملReducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests
If the loop iterations of a loop nest cannot be partitioned into independent sets, the data communication for data dependences are inevitable in order to execute them on parallel machines. This kind of loop nests are referred to as Doacross loop nests. This paper is concerned with compiler algorithms for parallelizing Doacross loop nests for distributed-memory multicomputers. We present a metho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999